Arrays of (locality-sensitive) Count Estimators (ACE): High-Speed Anomaly Detection via Cache Lookups

نویسندگان

  • Chen Luo
  • Anshumali Shrivastava
چکیده

Anomaly detection is one of the frequent and important subroutines deployed in large-scale data processing systems. Even being a well-studied topic, existing techniques for unsupervised anomaly detection require storing significant amounts of data, which is prohibitive from memory and latency perspective. In the big-data world existing methods fail to address the new set of memory and latency constraints. In this paper, we propose ACE (Arrays of (locality-sensitive) Count Estimators) algorithm that can be 60x faster than the ELKI package [2], which has the fastest implementation of the unsupervised anomaly detection algorithms. ACE algorithm requires less than 4MB memory, to dynamically compress the full data information into a set of count arrays. These tiny 4MB arrays of counts are sufficient for unsupervised anomaly detection. At the core of the ACE algorithm, there is a novel statistical estimator which is derived from the sampling view of Locality Sensitive Hashing(LSH). This view is significantly different and efficient than the widely popular view of LSH for near-neighbor search. We show the superiority of ACE algorithm over 11 popular baselines on 3 benchmark datasets, including the KDD-Cup99 data which is the largest available benchmark comprising of more than half a million entries with ground truth anomaly labels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Network Processor based Router and the Cache Design: Implementation and Evaluation

High performance routers are mostly implemented with network processors because of their software programmability, hardware computation power, and high bandwidth interface design. In this paper, a 5-dimensional packet classification algorithm based on the hierarchal binary prefix search is first implemented in IXP1200 network processor. Our classification implementation is faster and smaller th...

متن کامل

Cache Efficient Bloom Filters for Shared Memory Machines

Bloom filters are a well known data-structure that supports approximate set membership queries that report no false negatives. Each element in the universe represented by the bloom filter is associated with k random bits in the structure. Traditional bloom filters, therefore, require k non-local memory operations to insert an element or perform a lookups. For very large bloom filters, these k l...

متن کامل

Design and Analysis of Hashing Algorithms with Cache Eeects

This paper investigates the performance of hashing algorithms by both an experimental and an analytical approach. We examine the performance of three classical hashing algorithms: chaining, double hashing and linear probing. Our experimental results show that, despite the theoretical superiority of chaining and double hashing, linear probing outperforms both for random lookups. We explore varia...

متن کامل

Fast indexing for blocked array layouts to reduce cache misses

The increasing disparity between memory latency and processor speed is a critical bottleneck in achieving high performance. Recently, several studies have been conducted on blocked data layouts, in conjunction with loop tiling to improve locality of references. In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, so that array elements are ...

متن کامل

Case Studies on Cache Performance and Optimization of Programs with Unit Strides

Cache performance in modern computers is important for program efficiency. A cache is thrashing if a significant amount of time is spent moving data between the memory and the cache. This paper presents two cache thrashing examples, one in scientific computing and one in image processing, both of which involve several one-dimensional arrays that are accessed sequentially, i.e., with unit stride...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.06664  شماره 

صفحات  -

تاریخ انتشار 2017